Jupyter Data Science Workflow

From exploratory analysis to reproducible science

Jake VanderPlas

University of Washington eScience Institute


In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn')

In [2]:
# pd.DataFrame? question mark can return documentation in Jupyter
# pd.DataFrame?? for source code

from jupyterworkflow.data import get_data

In [3]:
data = get_data()
data.head()


Out[3]:
West East Total
Date
2012-10-03 00:00:00 4.0 9.0 13.0
2012-10-03 01:00:00 4.0 6.0 10.0
2012-10-03 02:00:00 1.0 1.0 2.0
2012-10-03 03:00:00 2.0 3.0 5.0
2012-10-03 04:00:00 6.0 1.0 7.0

In [4]:
data.resample('W').sum().plot();
# ax.set_ylim(0, None);



In [5]:
data.groupby(data.index.time).mean().plot();



In [6]:
pivoted = data.pivot_table('Total', index=data.index.time, columns=data.index.date)
pivoted.iloc[:5, :5]


Out[6]:
2012-10-03 2012-10-04 2012-10-05 2012-10-06 2012-10-07
00:00:00 13.0 18.0 11.0 15.0 11.0
01:00:00 10.0 3.0 8.0 15.0 17.0
02:00:00 2.0 9.0 7.0 9.0 3.0
03:00:00 5.0 3.0 4.0 3.0 6.0
04:00:00 7.0 8.0 9.0 5.0 3.0

In [7]:
pivoted.plot(legend=False, alpha=0.01); #Transparency


Unit Testing

Using PyTest to validate your functions are working.

Making a note to describe the purpose of Pytest. When packaging a function, you want to make sure it does what it set out to accomplish. Unit tests provide a validation, and are especially useful when testing out new code.

The interesting part is that it also provides a speed test of sorts to determine how efficient the function was for its purposes.

python -m pytest jupyterworkflow/


In [ ]: